9 research outputs found

    Accelerating BPC-PaCo through visually lossless techniques

    Get PDF
    Fast image codecs are a current need in applications that deal with large amounts of images. Graphics Processing Units (GPUs) are suitable processors to speed up most kinds of algorithms, especially when they allow fine-grain parallelism. Bitplane Coding with Parallel Coefficient processing (BPC-PaCo) is a recently proposed algorithm for the core stage of wavelet-based image codecs tailored for the highly parallel architectures of GPUs. This algorithm provides complexity scalability to allow faster execution at the expense of coding efficiency. Its main drawback is that the speedup and loss in image quality is controlled only roughly, resulting in visible distortion at low and medium rates. This paper addresses this issue by integrating techniques of visually lossless coding into BPC-PaCo. The resulting method minimizes the visual distortion introduced in the compressed file, obtaining higher-quality images to a human observer. Experimental results also indicate 12% speedups with respect to BPC-PaCo

    Complexity scalable bitplane image coding with parallel coefficient processing

    Get PDF
    Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    High Throughput Image/Video Codec with Nvidia GPUs

    No full text
    The increasing number of image and video content, and the adoption of 8K resolution and high dynamic range technologies, demand faster and more efficient digital coding solutions to store and transfer these data. State-of-the-art solutions like HEVC or JPEG2000 are widely adopted but their computational requirements pose a challenge even for current hardware. For environments like digital cinema or medical image, specific FPGA boards are used to accelerate image processing without affecting image quality. In the last years, a massive parallel hardware has started to gain attraction: Graphical Processing Units (GPUs). GPUs are massive parallel architectures originally suited for videogames or 3D simulations. In the recent years, their adoption as general purpose devices have allowed to use them as accelerators for a myriad of applications. Algorithms properly adapted to run on GPUs get significant throughput improvements when compared to their CPU implementation. This research focuses on creating an end-to-end codec based on the JPEG2000 standard tailored for GPUs. This thesis proposes five main contributions, all of which have been published in relevant conferences or journals. The first one focuses on the first end-to-end GPU codec version, which can code and decode gray-scale images. The second version includes the implementation of the video engine within the codec, which can process up to two frames simultaneously. The third contribution consists of an in-depth analysis of the end-to-end codec with multiple throughput improvements and the addition of a multi-frame processing approach, which allows to process multiple frames simultaneously when coding video. The fourth contribution proposes the implementation of an improvement to the core coding engine, tested on a CPU version of the end-to-end codec. The last contribution details an in-depth analysis of the improvement presented in the previous paper but implemented in the end-to-end GPU codec, including results with improvements of more than 10× the performance of the best JPEG2000 commercial implementation when processing 4K RGB video.El aumento de la cantidad de contenido, tanto en imagen como en video, y la rápida adopción de la resolución 8K, además de tecnologías de alto rango dinámico (HDR), están creando la necesidad de construir propuestas de codificación digital más eficientes y rápidas para transmitir y almacenar estos datos. Las últimas y más avanzadas soluciones como HEVC o JPEG2000 se han adoptado rápidamente y están muy extendidas, pero sus requisitos computacionales suponen un desafío incluso para el hardware más puntero. Es por esto por lo que en entornos como el cine digital o el procesamiento de imágenes médicas se utilizan dispositivos FPGA para acelerar el procesamiento y codificación de imagen sin que por ello quede reducida la calidad. En los últimos años, se ha empezado a usar un tipo de arquitectura paralela masiva muy eficiente en este tipo de entorno: las unidades de procesamiento gráfico (GPUs), o comúnmente denominadas tarjetas gráficas. Las GPU son unidades de procesamiento paralelo masivo que originalmente se diseñaron para renderizar entornos 3D o videojuegos. En los últimos años, se han empezado a utilizar como dispositivos de propósito más general, usándose como aceleradores de procesos en una miríada de campos. Los algoritmos que se consiguen adaptar de una forma eficiente y precisa para ejecutarse en tarjetas gráficas consiguen un aumento significativo de su rendimiento comparado con la implementación que se hiciera previamente para CPUs. Esta investigación persigue crear un códec de imagen y vídeo que se basa en el estándar JPEG2000 pero que se ejecuta íntegramente en GPUs. Esta tesis está compuesta por cinco publicaciones, las cuales han sido publicadas en conferencias o revistas de relevancia internacional. El primer artículo se centra en la primera versión de este códec end-to-end para GPUs, capaz de comprimir y descomprimir imágenes en escala de grises. La segunda versión del códec incluye un motor de procesamiento de vídeo capaz de procesar dos frames en paralelo. La tercera contribución explora en detalle el codec end-to-end, con muchas mejoras en procesamiento en contraposición a la versión anterior, y con la capacidad de procesar un mayor número de frames en paralelo cuando codifica video. La cuarta contribución incluye una mejora específica del motor principal, probado en una versión para CPUs. La última contribución traduce esta mejora al códec end-to-end en GPUs y lo analiza desde varios puntos de vista. Los resultados arrojados por la última parte de esta investigación muestran que el códec es capaz de ir hasta 10 veces más rápido que la mejor implementación comercial de JPEG2000 al procesar vídeo en 4K a color

    Real-time 16K video coding on a GPU with complexity scalable BPC-PaCo

    No full text
    Altres ajuts: Acord transformatiu CRUE-CSICThe advent of new technologies such as high dynamic range or 8K screens has enhanced the quality of digital images but it has also increased the codecs' computational demands to process such data. This paper presents a video codec that, while providing the same coding features and performance as those of JPEG2000, can process 16K video in real time using a consumer-grade GPU. This high throughput is achieved with a technique that introduces complexity scalability to a bitplane coding engine, which is the most computationally complex stage of the coding pipeline. The resulting codec can trade throughput for coding performance depending on the user's needs. Experimental results suggest that our method can double the throughput achieved by CPU implementations of the recently approved High-Throughput JPEG2000 and by hardwired implementations of HEVC in a GPU

    Complexity scalable bitplane image coding with parallel coefficient processing

    No full text
    Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    No full text
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    ESICM LIVES 2016: part two : Milan, Italy. 1-5 October 2016.

    Get PDF
    Meeting abstrac
    corecore